Annealing Techniques For Unsupervised Statistical Language Learning
نویسندگان
چکیده
Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing (Rose et al., 1990) as an appealing alternative to the ExpectationMaximization algorithm (Dempster et al., 1977). Seeking to avoid search error, DA begins by globally maximizing an easy concave function and maintains a local maximum as it gradually morphs the function into the desired non-concave likelihood function. Applying DA to parsing and tagging models is shown to be straightforward; significant improvements over EM are shown on a part-of-speech tagging task. We describe a variant, skewed DA, which can incorporate a good initializer when it is available, and show significant improvements over EM on a grammar induction task.
منابع مشابه
On Unsupervised Learning of Mixtures of Markov Sources Thesis submitted for the degree \Master of Science"
Unsupervised classi cation, or clustering, is one of the basic problems in data analysis. While the problem of unsupervised classi cation of independent random variables has been deeply investigated, the problem of unsupervised classi cation of dependent random variables, and in particular the problem of segmentation of mixtures of Markov sources, has been hardly addressed. At the same time sup...
متن کاملUnambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into gram...
متن کاملMultiscale Annealing for Real-Time Unsupervised Texture Segmentation
We derive real{time global optimization methods for several clustering optimization problems commonly used in unsupervised texture segmentation. Speed is achieved by exploiting the image neighborhood relation of features to design a multiscale optimization technique, while accuracy and global optimization properties are gained using annealing techniques. Coarse grained cost functions are derive...
متن کاملLearning Constructions of Natural Language: Statistical Models and Evaluations
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Sami Virpioja Name of the doctoral dissertation Learning Constructions of Natural Language: Statistical Models and Evaluations Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 158/2012 Field of research Computer and Information Sci...
متن کاملRobust Unsupervised Clustering Using Generalized Annealing M-estimator
A new robust clustering algorithm, called generalized annealing M-estimator (GAM-estimator), is proposed. Initialized with multiple seeds, the GAM-estimator converges to several optimal cluster centers. Neither knowledge about the number of clusters nor scale is needed. The global optimal solution of clustering is achieved by minimization of an objective function. The algorithm is applied to un...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004